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Abstract — The paper introduces scaled Bregman distances of 
probability distributions which admit non-uniform contributions 
of observed events. They are introduced in a general form cover- 
ing not only the distances of discrete and continuous stochastic 
observations, but also the distances of random processes and 
signals. It is shown that the scaled Bregman distances extend 
not only the classical ones studied in the previous literature, 
but also the information divergence and the related wider class 
of convex divergences of probability measures. An information 
processing theorem is established too, but only in the sense of 
invariance w.r.t. statistically sufficient transformations and not in 
the sense of universal monotonicity. Pathological situations where 
coding can increase the classical Bregman distance are illustrated 
by a concrete example. In addition to the classical areas of 
application of the Bregman distances and convex divergences 
such as recognition, classification, learning and evaluation of 
proximity of various features and signals, the paper mentions 
a new application in 3D-exploratory data analysis. Explicit 
expressions for the scaled Bregman distances are obtained in 
general exponential families, with concrete applications in the 
binomial, Poisson and Rayleigh families, and in the families of 
exponential processes such as the Poisson and diffusion processes 
including the classical examples of the Wiener process and 
geometric Brownian motion. 

Index Terms — Bregman distances, classification, divergences, 
exponential distributions, exponential processes, information re- 
trieval, machine learning, statistical decision, sufficiency. 

I. Introduction 

BREGMAN (1967) introduced for convex functions <j> : 
R d — >• K with gradient V0 the 0-depending nonnegative 
measure of dissimilarity 



Bjfaq) = (j){p) - (j>{q) - V(/)(q)(p - q) 



(1) 



of (i-dimensional vectors p, q E M. d . His motivation was 
the problem of convex programming, but in the subsequent 
literature it became widely applied in many other problems 
under the name Bregman distance in spite of that it is not in 
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general the usual metric distance (it is a pseudodistance which 
is reflexive but neither symmetric nor satisfying the triangle 
inequality). The most important feature is the special separable 
form defined by 

d 

B$(p,q) = [</>(&) ~ 0(?O ~ ^(?i)(P* " ft)] (2) 

for vectors p = (p 1: ...,p d ),q = (q u ...,q d ) and convex 
differentiable functions <\> : M — > K. For example, the function 
4>(t) = (t — l) 2 leads to the classical squared Euclidean 
distance 



(3) 



i=l 



In the optimization-theoretic context the Bregman distances 
are usually studied in the general form ([T} - see, e.g., 
Csiszar and Matus (2008, 2009), as well as Bauschke and 
Borwein (1997) for adjecent random projection studies. In the 
information-theoretic or statistical context they are typically 
used in the separable form (|2]i for vectors p, q with nonneg- 
ative coordinates representing generalized distributions (finite 
discrete measures) and functions <fi : [0, oo) — >• R differentiable 
on (0, oo) (the problem with q,; = is solved by resorting 
to the right-hand derivative dj' + (0)). The concrete example 
<f>(t) = tint leads to the well-known Kullback divergence 

d 

B<p(p,q) = VVlri— . 

t=l y 

Of course, the most common context are discrete probability 
distributions p, q since vectors of hypothetical or observed fre- 
quencies p, q are easily transformed to the relative frequencies 
normed to 1. For example, Csiszar (1991, 1994, 1995) or 
Pardo and Vajda (1997, 2003) used the Bregman distances of 
probability distributions in the context of information theory 
and asymptotic statistics. 

Important alternatives to the Bregman distances (ffj) are the 
4>- divergences defined by 



Dc/>(p,q) 



(4) 



for functions which are convex on [0,oo), continuous on 
(0, oo) and strictly convex at 1 with <fi(l) =0. Originating in 
the paper of Csiszar (1963), they share some properties with 
the Bregman distances (0, e.g., they are pseudodistances too. 
For example, the above considered functions 4>(t) = (t — l) 2 
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and <j>(t) = tint lead in this case to the classical Pearson 
divergence 



D^p, q) 



E — 



Qi) 



(5) 



and the above mentioned Kullback divergence D ( p(p,q) = 
B<p(p,q) which are asymmetric in p, q and contradict the 
triangle inequality. On the other hand, (j>(t) = |t — 1| leads 
to the Li-norm \\p — q\\ which is a metric distance and 
(j)(t) = (t — l) 2 /(t + 1) defines the LeCam divergence 

(Pi - qi) 2 



D4p,q)=J2 



Pi + qi 



which is a squared metric distance (for more about the 
metricity of <j> -divergences the reader is referred to Vajda 
(2009)). 

However, there exist also some sharp differences between 
these two types of pseudodistances of distributions. One 
distinguising property of Bregman distances is that their use as 
loss criterion induces the conditional expectation as outcoming 
unique optimal predictor from given data (cf. Banerjee at al. 
(2005a)); this is for instance used in Banerjee et al. (2005b) 
for designing generalizations of the k-means algorithm which 
deals with the special case of squared Euclidean error (0 
(cf. the seminal work of Lloyd (1982) reprinting a Technical 
Report of Bell Laboratories dated by 1957). These features 
are generally not shared by those of the (^-divergences which 
are not Bregman distances, e.g., by the Pearson divergence 
(0. On the other hand, a distinguishing property of <j>- 
divergences is the information processing property, i.e., the 
impossibility to increase the value D ( f,(p,q) by transforma- 
tions of the observations distributed by p, q and preservation 
of this value by the statistically sufficient transformations 
(Csiszar (1967), see in this respect also Liese and Vajda 
(2006)). This property is not shared by the Bregman distances 
which are not (^-divergences. For example, the distributions 
p = (1/2, 1/4, 1/4) and q = (1, 0, 0) are mutually closer (less 
discernible) in the Euclidean sense (0 than their reductions 
p = (1/2, 1/2) and q = (1,0) obtained by merging the second 
and third observation outcomes into one. 

Depending on the need to exploit one or the other of these 
distinguished properties, the Bregman distances or Csiszar di- 
vergences are preferred, and both of them are widely applied in 
important areas of information theory, statistics and computer 
science, for example in 

(Ai) information retrieval (see, e.g., Do and Vetterli (2002), 
Hertz at al. (2004)), 

(Aii) optimal decision (for general decision see, e.g., Bo- 
ratynska (1997), Freund et al. (1997), Bartlett et al. (2006), 
Vajda and Zvarova (2007), for speech processing see, e.g., 
Carlson and Clements (1991), Veldhuis and Klabers (2002), 
for image processing see, e.g., Xu and Osher (2007), Marquina 
and Osher (2008), Scherzer et al. (2008)), and 

(Aiii) machine learning (see, e.g., Laferty (1999), Banerjee et 
al. (2005), Amari (2007), Teboulle (2007), Nock and Nielsen 
(2009)). 



(Aiv) parallel optimization and computing (see, e.g., Censor 
and Zenios (1997)). 

In this context it is obvious the importance of the functionals 
of distributions which are simultaneously divergences in both 
the Csiszar and Bregman sense or, more broadly, of the 
research of relations between the Csiszar and Bregman diver- 
gences. This paper is devoted to this research. It generalizes the 
separable Bregman distances (0 as well as the (^-divergences 
(0 by introducing the scaled Bregman distances which for the 
discrete setup reduce to 



B<f,(p,q\ 



LL 

i=l 

-<j)' + (qi/mi)(pi/mi - qi/rrii 



rrii (6) 



for arbitrary finite scale vectors m = (mi, ...,m<j), convex 
functions <fr an d right-hand derivatives <f>',. Obviously, the 
uniform scales m = (1, 1) lead to the Bregman distances 
(0 and the probability distribution scales m = q = (qi, qj) 
lead to the (^-divergences (0. We shall work out further 
interesting relations of the B c f,(p,q\m) distances to the <fi- 
divergences D c f ) (p,q) and D ( f,(p,m) and evaluate explicit 
formulas for the stochastically scaled Bregman distances in 
arbitrary exponential families of distributions, including also 
the non-discrete setup. 

Section II defines the (^-divergences D^P, M) of general 
probability measures P and arbitrary finite measures M and 
briefly reviews their basic properties. Section III introduces 
scaled Bregman distances B c j i (P, Q\M) and investigates their 
relations to the 4> -divergences D c f,(P 1 Q) and D^(P,M). 
Section IV studies in detail the situation where all three 
measures P,Q,M are from the family of general exponen- 
tial distributions. Finally, Section V illustrates the results by 
investigating concrete examples of P, Q, M from classical 
statistical families as well as from a family of important 
random processes. 

Notational conventions: Throughout the paper, DJl de- 
notes the space of all finite measures on a measurable space 
(X,A) and C DJl the subspace of all probability mea- 
sures. Unless otherwise explicitly stated P, Q, M are mutually 
measure-theoretically equivalent measures on (X,A) domi- 
nated by a er-finite measure A on (X,A). Then the densities 

dP dQ , dM 

p= dA' q = Jx and m= dT 

have a common support which will be identified with X (i.e., 
the densities (0 are positive on X). Unless otherwise explicitly 
stated, it is assumed that P, Q G *P, M G ffl and that : 
(0, oo) i— > R is a continuous and convex function. It is known 
that then the possibly infinite extension </>(0) = lim t ^o (j>if) 
and the right-hand derivatives (j>' + (t) for t <E [0,oo) exist, and 
that the adjoint function 



(7) 



0*(f )=#(!/«) 



(8) 



is continuous and convex on (0,oo) with possibly infinite 
extension </>*(0). We shall assume that 0(1) = </>*(!) = 0. 
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II. DIVERGENCES 
For P 6$ and M e 9)1 we consider 

D${P,M) = f 4(2-) dM= f m( f>(2-)dX (cf. Q) 
Jx Vm/ Jx Vto/ 

(9) 

generated by the same convex functions as considered in the 
formula for discrete P and M . An important special case 
is D^P,Q) with Qeqj. 

The existence (but possible infinity) of the ^-divergences 
follows from the bounds 

(p',(l)(p-m) < md>(—) < m0(O)+p0*(O) (10) 
V m / 

on the integrand, leading to the 0-divergence bounds 

0' + (l)(l - M{X)) < D${P,M) < M(#)0(O) + 0*(O). 

(11) 

The integrand bounds (TTOb follow by putting s = 1 and t = 
p/m in the inequality 

0(s) + ct>' + (s)(t -a)< 4>{t) < 0(0) + i0*(O), (12) 

where the left-hand side is the well-known support line of <j>(t) 
at t = s. The right-hand inequality is obvious for 0(0) = oo. 
If (f)(0) < oo then it follows by taking s — > oo in the inequality 

s 

obtained from the Jensen inequality for 0(t) situated between 
0(0) and </>(s). Since the function ip(p,m) = m0(p/m) is 
homogeneous of order 1 in the sense ip(tp,tm) = tip{p,m) 
for alH > 0, the divergences (O do not depend on the choice 
of the dominating measure A. 

Notice that D$(P,M) might be negative. For probability 
measures P, Q the bounds (fTTT > take on the form 



0<Z^(P,Q) <0(O) + 0*(O), 



(13) 



and the equalities are achieved under well-known conditions 
(cf. Liese and Vajda (1987), (2006)): the left equality holds 
if P = Q, and the right one holds if P J. Q (singularity). 
Moreover, if 0(i) is strictly convex at t = 1, the first can 
be replaced by iff, and in the case 0(0) + 0*(O) < oo also 
the second ;/ can be replaced by iff. 

An alternative to the left-hand inequality in (fTTT i. which ex- 
tends the left-hand inequality in ( fT3l including the conditions 
for the equality, is given by the following statement (for a 
systematic theory of 0-divergences of finite measures we refer 
to the recent paper of Stummer and Vajda (2010)). 

Lemma 1: For every P € M € SETt one gets the lower 
divergence bound 



M (*)<^T77^T ) <D^PM) , 



\M[X) 
where the equality holds if 

m 

V = 



M{X) 



P-a.s. 



(14) 



(15) 



If D$(P, M) < oo and 0(i) is strictly convex at t = 1/M(X), 
the equality in ( TBI holds if and only if (TT31 ) holds. 



Proof: By (O and the definition ^ of the convex 
function 0* 

D^{P,M) = J J* (j)dP 
Hence by Jensen's inequality 

D^{P, M) > 0* j dPj = 4>*{M{X)) (16) 
which proves the desired inequality (fl4l . Since 

TYl 

— = M(X) P-a.s. 
P 

is the condition for equality in ( TTSI l, the rest is clear from the 
easily verifiable fact that 0*(t) is strictly convex at t = s if 
and only if 0(i) is strictly convex at t = 1/s, □ 

For some of the representation investigations below, it will 
also be useful to take into account that for probability measures 
P, Q we get directly from definition (O the "skew symmetry" 
0-divergence formula 

D r (PQ) = D c ,(Q 1 P) , 

as well as the sufficiency of the condition 

0(i)-0*(t) = constant- (t - 1) (17) 

for the 0-divergence symmetry 

D^P, Q) = D^(Q, P) for all P, Q . (18) 

Liese and Vajda (1987) proved that under the assumed strict 
convexity of 0(f) at t = 1 the condition ( fTTI i is is not only 
sufficient but also necessary for the symmetry ( fTSt . 

III. SCALED BREGMAN DISTANCES 

Let us now introduce the basic concept of the current paper, 
which is a measure-theoretic version of the Bregman distance 
©. In this definition it is assumed that is a finite convex 
function in the domain t > 0, continuously extended to t = 0. 
As before, <j>' + {t) denotes the right-hand derivative which for 
such 4>(t) exists and p, q, m are the densities defined in Q. 

Definition 1: The Bregman distance of probability mea- 
sures P, Q scaled by an arbitrary measure M on (X, A) 
measure-theoretically equivalent with P, Q is defined by the 
formula 



E-4, (P,Q\M) 



x 



-)- 




m/ 








(") 


— 7710 







p_ 

1)1 



fl_ 

1)1 



AM 



£MV(£) (P-g) 

777/ V 777 / 



(19) 



dA. 



The convex under consideration can be interpreted as a 
generating function of the distance. 
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Remarks 1: (1) By putting t = p/m and s = q/m in ([TSl 
we find the argument of the integral in $19[ to be nonnegative. 
Hence the Bregman distance (P, Q \ M) is well-defined by 
< fT9b and is always nonnegative (possibly infinite). 

(2) Notice that the integrand in the first (respectively sec- 
ond) integral of dT9b constitutes a function, say, T(p,q,m) 
(respectively T(p,q,m)) which is homogeneous of order 
(respectively order 1), i.e., for all £ > there holds 
T(tp,tq,tm) = T(p,q,m) (respectively Y(tp,tq,tm) = 
t ■ T(p,q,m)). Analogously, as already partially indicated 
above, the integrand in the first (respectively second) integral 
of (0 is also a function, say, ip(p,m) (respectively ip(p,m)) 
which is homogeneous of order (respectively order 1). 

(3) In our measure-theoretic context $1% we have incorpo- 
rated the possible non-differentiability of <j) by using its right- 
hand derivative, which will be essential at several places below. 
For general Banach spaces, one typically employs various di- 
rectional derivatives - see, e.g., Butnariu and Resmerita (2006) 
in connection with different types of convexity properties. 

The special scaled Bregman distances B^ (P, Q \ AI) for 
probability scales M £ *P were introduced by Stummer 
(2007). Let us mention some other important previously con- 
sidered special cases. 

(a) For X finite or countable and counting measure M = A 
some authors were already cited above in connection with the 
formula (0 and the research areas (Ai) - (Aiii). In addition to 
them, one can mention also Byrne (1999), Collins et al. (2002), 
Murata et al. (2004), Cesa-Bianchi and Lugosi (2006). 

(b) For open Euclidean set X and Lebesgue measure AI = A 
on it one can mention Jones and Byrne (1990), as well as 
Resmerita and Anderssen (2007). 

In the rest of this paper, we restrict ourselves to the Bregman 
distances B^ (P, Q \ M) scaled by finite measures M £ A4 
and to the same class of convex functions as considered in the 
^-divergence formulas © and (0. By using the remark after 
Definition 1 and applying dT2l > we get 



D</,(P,M) > D^(Q,M) 



x 



(p-q)dX 



if at least one of the right-hand side expressions is finite. 
Similarly, 

P (P, Q | M) = D^P, AI) - D^Q, M) - j <jt + dA 

X m (20) 

if at least two of the right-hand side expressions are finite 
(which can be checked, e.g., by using (fTTT i or JT41 ). 

The formula (TT9b simplifies in the important special cases 
AI = P and AI = Q. In the first case, due to </>(l) = it 
reduces to 



Bf(P,Q\P) 



x 



i:)(?-p)-p^(: 



dA 



where the difference (fJTJ is meaningful if and only if 
D<f,(Q, P) = D^* (P, Q) is finite. The nonnegative divergence 
measure B ( j ) (P,Q) := B^ (P,Q \ P) is thus the difference 
between the nonnegative dissimilarity measure 

IV(Q,P) = ^v(j)(g-p)dA > D^(Q,P) 

and the nonnegative 0— divergence D c j,(Q, P). Furthermore, in 
the second special case M = Q the formula $19[ leads to the 
equality 

B^(PQ\Q) = J D (P,Q) (22) 

without any restriction on P, Q £ *}3 as realized already by 
Stummer (2007). 

Conclusion 1: Equality ( f22b - together with the fact that 
B(j> (P, Q | AI) depends in general on M (see, e.g., Subsec- 
tion B below) - shows that the concept of scaled Bregman 
distance dT9b strictly generalizes the concept of 0— divergence 
D<p(P, Q) of probability measures P, Q. 

Example 1: As an illustration not considered earlier we 
can take the non-differentiable function 4>{t) = \t — 1| for 
which 

B^ (P,Q\Q) = V(P,Q) 

i.e., this particular scaled Bregman distance reduces to the well 
known total variation. 

As demonstrated by an example in the Introduction, mea- 
surable transformations (statistics) 



T:(X,A)^(y,B) 



(23) 



I ^ ) (q-p)d\-D 4> (Q 1 P) , 



(21) 



which are not sufficient for the pair {P, Q} can increase those 
of the scaled Bregman distances B^ (P, Q \ AI) which are not 
(f) -divergences. On the other hand, the transformations (l23l 
which are sufficient for the pair {P, Q} need not preserve these 
distances either. Next we formulate conditions under which 
the scaled Bregman distances B^ (P, Q \ M) are preserved by 
transformations of observations. 

Definition 2: We say that the transformation ( l23l is 
sufficient for the triplet {P, Q, M} if there exist measurable 
functions g p,gQ, gu ■ y l_ ^ R and h : X >-> R such that 

p(x) = g P (Tx)h(x), q(x) = g Q (Tx)h(x) 

and m(x) = g M {Tx)h{x). (24) 

If M is probability measure then our definition reduces to 
the classical statistical sufficiency of the statistic T for the 
family {P, Q, AI} (see pp. 18-19 in Lehman (2005)). All 
transformations d23l induce the probability measures PT _1 , 
QT^ 1 and the finite measure AIT^ 1 on (y,B). We prove that 
the scaled Bregman distances of induced probability measures 
PT-\ QT- 1 scaled by AIT- 1 are preserved by sufficient 
transformations T. 
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Theorem 1: The transformations d23l sufficient for the 
triplet {P, Q, M} preserve the scaled Bregman distances in 
the sense that 



(PT~\ QT- 1 1 MT- 1 ) = (P, Q\M). 



(25) 



Proof.: By (O and d24l) . the right-hand side of (f25]l is 
equal to 

/ [<t>P,M (Tx) - 4> QM (Tx) - A RQ .M (Tx)} dM (26) 
Jx 



for 



, / \ if gp{y) \ , / x , ( 9q{v) 
t>p,M (y) = 9\ — rr ' vq,m (y) = <p 1 



and 



\9M(y) 



&p,q,m (y) = <?-■ 1 



(27) 



\9M{y) 



9M{y), 

(9p(y) - 9 Q (v)) ■ (28) 



By Theorem D in Section 39 of Halmos (1964), the integral 
is equal to 



y 



\m (y) - 4>qm (y) - &p,q,m (y)] dMT~ 



(29) 



and, moreover, 



PiT-'B) = f g P (y)h(T- 1 y)dXT- 1 

J B 

and similarly for Q instead of P. Therefore 

HPT -1 HOT -1 
^ f - T = gp (y)h(T-'y) and = g Q (y) h^y) 

which together with (1271 1. (1251 and ( fT9l implies that the integral 
( f29l is nothing but the left-hand side of ([25). This completes 
the proof. □ 

Remark 2: Notice that by means of Remark 1(2) after 
Definition 1, the assertion of Theorem 1 can be principally re- 
lated to the preservation of 0— divergences by transformations 
which are sufficient for the pair {P, Q} . 

In the rest of this section we discuss some important special 
classes of scaled Bregman distances obtained for special 
distance-generating functions 0. 



A. Bregman logarithmic distance 

Let us consider the special function (p(t) 
4>'(t) = Int + 1 so that ([191) implies 



tint. Then 



B tlnt (P,Q\M) 

' i P 
pm 



.V 



X 



gin 



(ln^- + l) (p-q) 



(1A 



p In p In — 

TO TO 



(1A 



/ pln^dA = D tlnt {P,Q) 
J x 1 



(30) 



Thus, for 4>(t) = tint the Bregman distance B^ (P, Q \ M) 
exceptionally does not depend on the choice of the scaling and 
reference measures M and A; in fact, it always leads to the 



Kulllback-Leibler information divergence (relative entropy) 
D t \nt{P,Q) (cf. Stummer (2007)). As a side effect, this 
independence gives also rise to examples for the conclusion 
that the validity of d25l l does generally not imply that T is 
sufficient for the triplet {P, Q, M}. 

B. Bregman reversed logarithmic distance 

Let now <f>(t) = -lnt so that <f>'(t) = -l/t. Then (fT9l 
implies 



B_ lat (P,Q\M) 



x 



to mm, . 

mln m In 1 (p — q) 

p q q 



(1A 



(31) 



= A ln t (M, P) - D t ln 4 (M, Q)+ [ — dA - M(X) (32) 

Jx Q 

= D_ lnt (P,M)-D_ lnt (Q,M)+ [ ^dX-M(X) (33) 

Jx 1 

where the equalities ( [32l and ( f33T > hold if at least two out of 
the first three expressions on the right-hand side are finite. In 
particular, OTb implies (consistent with d22l ) 

B_ lat (P,Q\Q) = D_ hlt (P,Q) (34) 

and ( f32t implies for D t \ n t(P, Q) < oo (consistent with (fJTJ) 

B_ lat (P,Q\P) =x 2 (P,Q)- D tlat (PQ) (35) 

where 

X\P,Q)= I t^clA 

Jx q 

is the well-known Pearson information divergence. From 
(l34l and (|35] | one can also see that the Bregman distance 
£?0 (P, Q | M) does in general depend on the choice of the 
reference measure M. 

C. Bregman power distances 

In this subsection we restrict ourselves for simplicity to 
probability measures M E i.e., we suppose M{X) = 1. 
Under this assumption we investigate the scaled Bregman 
distances 

B a (P, Q | M) = B^ a {P,Q\M) , a6l,«/0,«^l 

(36) 

for the family of power convex functions 

j.a — 1 



t a - i 

a(a — 1) 



with #,(*) = 



f 



1 



(37) 



For comparison and representation purposes, we use for P 
(and analogously for Q instead of P) the power divergences 

D a (P,M) = D <jla (P,M) 

1 



a(a — 1) 



p a to 1- a dA — 1 



ex PPa (P,M)-l 
a(a — 1) 



with p a (P,M) = In / ; 

Jx 



p a m 1 - a dX 



(38) 
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of real powers a different from and 1, studied for arbitrary 
probability measures P,M in Liese and Vajda (1987). They 
are one-one related to the Renyi divergences 

pJRM) 

R a {P,M) = Ha . K ael, a^O, a ^ 1, 

ot(a — 1) 

introduced in Liese and Vajda (1987) as an extension of the 
original narrower class of the divergences 

R a {P,M) = — , a > 0, af=l 



of Renyi (1961). 



Returning now to the Bregman power distances, observe 
that if D a (P,M) + D a (Q,M) is finite then ©, (f36j and 
( l37l l imply for a^O, a^l 



B a (P,Q\M) 
= -D a (Q,M)- 



1 



q 



a-l 



a — 1 J x V m 
D a (P,M)-D a {Q,M) 

— I 

a-l Jx 

D a (P,M)-(l-a) D a (Q,M) 
1 

a—1 Jx ^rn 



Vm/ 



(p-?)dA 



clA 



(39) 



In particular, we get from here (consistent with (T221 ) 

B a {P,Q\Q) = D a (P,Q) 
and in case of D a (Q,P) = Di- a (P, Q) < oo also 
B a {P,Q\P) = (a-2)D a _ 1 (Q,P) + (a-l)D a (Q,P) 

= (a-2) D 2 _ a (P, Q) + (a- 1) D 1 _ a (P, Q). 

In the following theorem, and elsewhere in the sequel, we 
use the simplified notation 

D\ (P, M) = D t i„ t (P, M) and D (P,M)=D_ ln t (P, M) 

for the probability measures P, M under consideration (and 
also later on where M is only a finite measure). This step is 
motivated by the limit relations 



]imD a (P,M) = P_ lnt (P,M) and 

ctj.0 

lim D a (P,M) = D tlnt (P,M) 

q|1 



(40) 



proved as Proposition 2.9 in Liese and Vajda (1987) for 
arbitrary probability measures P, M. Applying these relations 
to the Bregman distances, we obtain 



Theorem 2: If D (P, M) + D (Q, M) < oo then 



lim B a (P, Q | M) 



D (P,M) - D (Q,M) + I (41) 

Jx <7 

B_ lnt (P,Q\M). (42) 



mp 



If D 1 (P,M)+D 1 (Q,M) < oo and 
lim f - 1 dP 



Mo 7^ p 

= f hra ( g / m )"' ~ 1 d P = - / Ini-dP (43) 

7a- Wo /3 Jx m 



then 



lim B Q (P, Q | M) = Pi (P, M) - [ ln — dP (44) 



= D 1 (P,Q) = B tl nt(P,Q\M) . 



(45) 



Proof: If < a < 1 then D a (P,M), D a (Q, M) 
are finite so that ( 1391 holds. Applying the first relation of 
(l40l > in (|39l we get (l4Tb where the right hand side is well 
defined because Dq(P, M) + Dq(Q,M) is by assumption 
finite. Similarly, by using the second relation of (l40l > and the 
assumption ( l43l in ( l39t we end up at d44l i where the right- 
hand side is well defined because Pi(P, M)+D\(Q, M) is 
assumed to be finite. The identity d42l follows from (l4TT i, ( 
[33T > and the identity (05) from ([30j>. □ 

Motivated by this theorem, we introduce for all probability 
measures P, Q, M under consideration the simplified nota- 
tions 

B l {P,Q\M) = B tlnt (P,Q\M) (46) 

and 

B (P, Q\M) = P_ ln t (P, Q | M) , (47) 
and thus, d45l > and (l42l become 

Pi(P,Q|M) = lim P Q (P,Q | M) 



and 



P (P, Q I M) = lim P a (P, Q | M ). 



Furthermore, in these notations the relations ( l3Qt , ( f34b and d3~5l > 
reformulate (under the corresponding assumptions) as follows 

B X {P,Q\M) = Pi(P,g) , 

B (P,Q\Q) =D (P,Q) 



and 



B (P,Q\P) = X 2 (P,Q)-D 1 (P,Q) 

= 2D 2 (P,Q)-D 1 {P,Q). (48) 
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Remark 3: The power divergences D a (P,Q) are usu- 
ally applied in the statistics as criteria of discrimination or 
goodness-of-fit between the distributions P and Q. The scaled 
Bregman distances B a (P,Q\M) as generalizations of the 
power divergences D a (P, Q) = B a (P, Q | Q) allow to extend 
the 2D-discrimination plots {[D a (P, Q); a] : c < a < d} C 
M 2 into more informative 3D -discrimination plots 

{[B a {P,Q \/3P + (l-/3)Q); a; (3} : c < a, < d} C K 3 

(49) 

reducing to the former ones for (5 = 0. The simpler 2D- 
plots known under the name Q-Q-plots are famous tools 
for the exploratory data analysis. It is easy to consider that 
the computer-aided appropriately coloured projections of the 
3D-plots ( |49l allow much more intimate insight into the 
relation between data and their statistical models. Therefore 
this computer-aided 3D-exploratory analysis deserves a deeper 
attention and research. The next example presents projections 
of two such plots obtained for a binomial model P and its 
data based binomial alternative Q. 

Example 2: Let P = Bin(n,p) be a binomial distribution 
with parameters n, p (with a slight abuse of notation), and 
Q = Bin(n,q). Figure 1 presents projections of the corre- 
sponding 3D-discrimination plots d49l for 0.2 < a < 2 and 
< f3 < 1, where the Subfigure (a) used the parameter 
constellation n = 10, p = 0.25, q = 0.20 whereas the 
Subfigure (b) used n = 10, p = 0.25, q = 0.30. In both 
cases, the ranges of B a (P, Q \ (5P+ (1 — f3)Q) are subsets of 
the interval [0.06,0.088]. 

IV. EXPONENTIAL FAMILIES 

In this section we show that the scaled Bregman power 
distances B a (P, Q \ M) can be explicitly evaluated for prob- 
ability measures P, Q, M from exponential families. Let 
us restrict ourselves to the Euclidean observation spaces 
(X,A) C (R d ,B d ) and denote by x ■ 9 the scalar product 
of x,6 £ R d . The convex extended real valued function 

6(0) = In f e x ' e dX(x), 9 £ R d , (50) 

and the convex set 

9 = {6 £ R d : b(9) < oo} 

define on (X ', .4) an exponential family of probability measures 
{Pg : 9 £ 6} with the densities 

p e (x) = -^-(x) =exp{x • 0-6(0)}, x £ R d , 9 £ 6. 

dA (51) 
The cumulant function 6(0) is infinitely differentiable on the 
interior with the gradient 

Note that dSTb are exponential type densities in the natural 
form. All exponential type distributions such as Poisson, 
normal etc. can be transformed to into this form (cf., e.g., 
Brown (1986)). 



The formula 

/ e x - e &\{x) = e b{e \ 9£ 6 (52) 
follows from ( T50l > and implies 

f xe x - dA(a?) = e b(e) V6(0), e 6. (53) 

Both formulas d52l and d53l will be useful in the sequel. 
We are interested in the scaled Bregman power distances 

B a (P 6l ,Pe 2 | Pe n ) for O , 0i, 2 6 6, a£ R. 

Here Pg 1 , Pg 2 , Pg are measure-theoretically equivalent prob- 
ability measures, so that we can turn attention to the formulas 
( 1391 , (t30l >. (T3~3l . and (|46l> to (g8]l, promising to reduce the 
evaluation of B a (Pg 1 , Pg 2 | Pg ) to the evaluation of the power 
divergences D a (Pg 1 ,Pg 2 ). Therefore we first study these 
divergences and in particular verify their finiteness, which was 
a sufficient condition for the applicability of the formulas ( |39l , 
(f30b and (l33l . To begin with, let us mention the following 
well-established representation: 

Theorem 3: If a € ffi differs from and 1, then the power 
divergence D a (Pg 1 , Pg 2 ) is for all 0i, 02 £ Q finite and given 
by the expression 

exp{6(a0i + (1 -a)9 2 ) - a6(0i) - (1 -a)6(0 2 )} - 1 
a(a — 1) 

(54) 

In particluar, it is invariant with respect to the shifts of the 
cumulant function linear in £ in the sense that it coincides 
with the power divergence D a (Pg 1 , Pg 2 ) in the exponential 
family with the cumulant function 6(0) = 6(0) + c + v ■ 9 
where c is a real number and v a d— vector. 

This can be easily seen by slightly extending d3~8l to get for 
arbitrary a £R and 0i , 2 £ Q 

1 + a ■ (a - 1) • D a (P 6l ,Pg 2 )= I pi p\- a dA 

J Rd cxp{x ■ [a9i + (1 - a) 2 ]} d\(x) 
~ exp{a6(0 1 ) + (l-a)6(0 2 )} 
which together with d52l l gives the desired result. 

The skew symmetry as well as the remaining power diver- 
gences Do(Pg 1 , Pg 2 ) and Di(Pg 1 , Pg 2 ) are given in the next, 
straightforward theorem. 

Theorem 4: For all 9\ , 2 £ and a£l different from 
and 1 there holds 

D a (Pg 2 ,P 8l ) = D 1 _ a {Pg 1 ,Pg 2 ) 

and for 2 £ 

D_ ln t {P 01 ,Pg 2 ) = D„ (P 8l , P g , ) = lim D a {P 9l , Pg 2 ) 

qJ.0 

= 6(0i) - 6(0 2 ) - V6(0 2 ) (0t - 2 ) (55) 

= lim D a (Pg 2 , P 9l ) = D x {Pg 2 ,P 8l ) = D t 1„ t (P fe , P tfl ) .(56) 



TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION THEORY 



8 




The main result of this section is the following represen- 
tation theorem for Bregman distances in exponential families. 
We formulate this in terms of the functions 

p a (0i,9 2 ) = &(a0i + (1 - a) B 2 ) - a&(0i) - (1 - a) b(8 2 ) 

(57) 

(where the right hand side is finite if < a < 1), as well as 
the functions cr a (9o,6i, 8 2 ) (ct E K, 0o, 0i, 2 £ &) defined as 
the difference 



c Q (0o, ^l) Q2) — 0i, 9 2 ) — vi 1 (Oo, 61, 2 

of the nonnegative (possibly infinite) 

<4(0 O , di, 2 ) = b(aO x + (1 - a) [0i -9 2 + 6 ] 
and the finite 



(58) 



(59) 



<T I a I (6 ,e 1> 6 2 ) = ab(6 1 ) + (l-a) 6(0^ - 6(0 2 ) + 6(0 O ) . 

(60) 

Alternatively, 

CT Q (0O, ^l) 02) = Pa (01) 00 + 01 — 02) 

+(1 - a) [6(0 O + 0i - 2 ) - b(0 o ) - 6(0i) + K02)] • (6D 

Theorem 5: Let 0o, 0i, 02 G be arbitrary. If a (a — 
1) 7^ then the Bregman distance of the exponential family 
distributions Pg 1 and Pg 2 scaled by Pg a is given by the formula 



B a (P ei ,Pe 2 \Pe ) 

exp/9 a (0i,0 o ) expp a (0 2 ,0o) expcr a (0o,0i,0 2 ) 



a(a — 1) 



+ 



+ 



1 - a 



(62) 



If 0o respectively 0i is from the interior 0, then the limiting 
Bregman power distances are 

B Q (P 9l ,Pg 2 \Pg ) 

= 6(0i) - 6(0 2 ) - Vfe(0 o ) (0i - 2 ) 

+ expao(0o,0i,0 2 )-l (63) 

respectively 

B 1 (P ei , Pe 2 | P So ) = 6(02 ) - 6(0i ) - V6(0i) (0 2 - 0i ) . (64) 



In particluar, all scaled Bregman distances d62l - (l64l are 
invariant with respect to the shifts of the cumulant function 
linear in € O in the sense that they coincide with the scaled 
Bregman distances B a (^Pg 1 , Pg 2 \ Pg,,^ in the exponential 

family with the cumulant function 6(0) = 6(0) + c + v ■ 
where c is a real number and v a d— vector. 



Proof: (a) By ( BP it holds for every a e 

Mi, 2 g e 

a-1 

Pe 1 (x) 



and 



Pe 2 (x) 



Pe (a:) 

= exp {(a - 1) [x ■ (0 2 - O ) - (6(0 2 ) - 6(0 O ))] 

+x-0!-6(0i)} 

= exp {x • (a 0i + (1 - a) [6 l - 2 + O ] ) 

-<7^(0o, 01, 02)} 

with a I J{6 , 0i, 2 ) from (J60j). Since ([52]) leads to 

y expjrc- (a6 1 + {l-a) [0 X - 2 + O ] ) } dA 

= expai(0 o , 0i, 2 ) 
for cr^(0 o , 0i, 02) given by ([39), it holds 

/ (?^) P6l d\ = exp ^(00,01,02) (65) 

J X \PBoJ 



where ct q (0o, 0i, 02) was defined in d58l ). Now, by plugging 

P = P 9l , Q = Pe 2 , M = Pg (cf. (EB) 
in d39b , we get for a(a — 1) ^ the Bregman distances 

5 Q (P eil Pg 2 \Pg ) 

= D a {P ei ,Pg 2 )-{l-a)D a (P e2 ,Pg ) 



1-a 



dA - 1 



(66) 
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By combining the power divergence formula d54b with (fSTb . 
one ends up with D a (P Sl ,Pg 2 ) = ^i^g^H^ which 



together with ( I65l l and (1661 1 leads to the desired representation 
([52}. 

(b) By the definition of D (P Q | M) in (gV} and by fiD 

S (P ei ,Pe 2 |P eo ) 

= Do (P^.Pso)- D (Pe 2 ,P eo )+ f 

Jx Pe 2 

where 



POo Vdi 
x Pe 2 



dA = expa (e ,9 1 ,e 2 ) (cf. d65]i). 



For O G the desired assertion (l63l l follows from here and 
from the formulas 

D (P 9i , P eo ) = 6(0;)-6(0o)-V&(0 o ) (fli-flo) for i = 1,2 
obtained from (1551 ). 

(c) The desired formula d64l i follows immediately from the 
definition d46l i and from the formulas (PPfl i. ( 1451 ), d55l l and (T56b . 

(d) The finally stated invariance is immediate. □ 

The Conclusion 1 of Section III about the relation between 
scaled Bregman distances and ^-divergences can be completed 
by the following relation between both of them and the 
classical Bregman distances ([T). 

Conclusion 2: Let B,p(x,y) be the classical Bregman 
distance O of x, y G R d and P = {P g : G R d } the ex- 
ponential family with cumulant function 0, i.e., with densities 

p 9 (s) = exp{s-6-<j>{9)}, s G R d . Then for all P x , P y , P z G P 

B^{x,y) = B l {P y ,P x \P z )=D 1 {P y ,P x ) , 

i.e., there is a one-to-one relation between the classical 
Bregman distance B^(x,y) and the scaled Bregman dis- 
tances Bi(P y ,P x \P z ) and power divergences D\(P y ,P x ) 
of the exponential probability measures generated by 
the cumulant function <fi. This means that the family 
{B a (P y ,P x \P z ) : a G R, z G R d } of scaled Bregman power 
distances and the family {D a (P y , P x ) : a G R} of power 
divergences extend the classical Bregman distances B f p(x,y) 
to which they reduce at a = 1 and arbitrary P z G P. In fact, 
we meet here the extension of the classical Bregman distances 
in three different directions: the first represented by various 
power parameters a £ R, the second represented by various 
possible exponential distributions parametrized by 9 G M d , and 
the third represented by the exponential distribution parameters 
z£l <i which are relevant when a^l. 

Remark 4: We see from Theorems 4 and 5 that - 
consistent with d30l l, ( l45l l - for arbitrary interior parameters 

do, 01, #2 G 

B 1 (Pg 1 ,P 92 \Pe )=D 1 (P 9l ,P 92 ), 

i. e. that the Bregman distance of order a = 1 of exponential 
family distributions P 9l , Pg 2 does not depend on the scaling 



distribution Pg n . The distance of order a = satisfies the 
relation 

B (P ei ,Pe 2 \Peo) = Do (P 01 ,Pe 2 ) + expa (6 , B u 2 ) - 1 
= B x {P 92 ,P 9l |P eo ) + A(0 o , l5 a ) , 

where 

A(0 O , 0i, 2 ) = expa o (0o, 0i,0 2 )-l 

represents a deviation from the skew-symmetry of the Breg- 
man distances B (P 9l ,P 92 \ P 9o ) and B x (P 92 ,P 9l \ P 9a ) of 
P 9l and P 9 . 2 . This deviation is zero if (for strictly convex 6(0) 
if and only if ) 0o = 02- 

Remark 5: We see from the formulas ( |54] | - (l64l that 
for all a G R the quantities D a (P 9l ,P 92 ), p Q (0i, 2 ), 
c a (0o, 0i i 02) and P Q (p9i j Pfe I P<? ) omv depend on the 
cumulant function 6(0) defined in ( l50l l. and «of directly on 
the reference measure A used in the definition formulas ( T50t . 

d2D. 

V. EXPONENTIAL APPLICATIONS 

In this section we illustrate the evaluation of scaled Bregman 
divergences B a (P 9l , Pg 2 | P 9o ) for some important discrete 
and continuous exponential families, and also for exponen- 
tially distributed random processes. 

Binomial model: Consider for fixed n > 2 on the 
observation space X = {0,...,n} the binomial distribution 
P 9 determined by 

P 9 [{x}} = \{{x}} ■ cxp{x ■ - 6(0)} = Qp*(l - p)»- 

for x G {0, n}, where 

n\ „ , p 



\[{x}} = , = In — — G = R and 6(0) = n ln(l+e e ) 
\xj 1-p 

After some calculations one obtains from d571 i and ( f6Tb 

1 _|_ e ae»i + (i-a)e 2 



Pa (01, 02) = "In 

and 

<7a(0 O ,01,02) = Win 



(1 + e i)«(l + e^)i-" 

h + e ei+(i-a)(e +e 1 -e 2 )j ^ + e e 2 ^i- 
(l + e e °)"(l + e fl i) 



Applying Theorem 5 one achieves an explicit formula for the 
binomial Bregman distances B a (P 9l , Pg 2 | Pe ) from here. 

Rayleigh model: An important role in communication 
theory play the Rayleigh distributions defined by the prob- 
ability densities 

Ox 2 



p 9 (x) = 9x exp 



G0 = (O,oo) (67) 



with respect to the restriction A + of the Lebesgue measure A 
on the observation space X = (0,oo). The mapping 

T(x) = -V2x 
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from the positive halftone (0, oo) to the negative halftone 
(—00, 0) transforms ( f67b into the family of Rayleigh densities 



i(x) = 0exp{0x} = exp {9x - b{6)} 

for 6(0) = 



ln0, 9 > 



with respect to the restriction A_ of the Lebesgue measure 
A on the observation space X = (— 00, 0). These are the 
Rayleigh densities in the natural form assumed in dSTb . After 
some calculations one derives from ( |57| ) 



2) = In 



If 6{- a 



and 



i) = ln 



a0\ + (1 — a) 



(68) 



(ae 1 + (l-a)(9 Q + 9 1 -9 2 )) 9\~ a ' 



Applying Theorem 5 one obtains the Rayleigh-Bregman dis- 
tances B a (Pg 1 , Pg 2 I Pg ) from here. 

Theorem 1 about the preservation of the scaled Bregman 
distances by statistically sufficient transformations is useful 
for the evaluation of these distances in exponential families. 
It implies for example that these distances in the normal and 
lognormal families coincide. The next two examples dealing 
with distances of stochastic processes make use of this theorem 
too. 

Exponentialy distributed signals: Most of the random 
processes modelling physical, social and economic phenomena 
are exponentially distributed. Important among them are the 
real valued Levy processes X. t = (X s : < s < t) with 
trajectories x t = (x s : < s < t) from the Skorokchod 
observation spaces (X t ,At) and parameters from the set 

9 = {6 e R : c(0) < 00} 

defined by means of the function 

c(9) = f x 2 e 9x /{l + x 2 )&v{x) 
Jn\{0} 

where v is a Levy measure which determines the probability 
distribution of the size of jumps of the process and the intensity 
with which jumps occur. It is assumed that belongs to 6 and 
it is known (cf., e.g., Kiichler and Sorensen (1994)) that the 
probability distributions P t .e induced by these processes on 
(X t ,At) are mutually measure-theoretically equivalent with 
the relative densities 

dP t j 



dP t 



= exp{0 x t -b t {9)} 



(69) 



for the end xt of the trajectory x t . The cumulant function 
appearing here is 



h{6)=t 



1 



2n2 



-a £ 9 



■7(f) 



(70) 



for two genuine parameters 6 G R respectively a > of the 
process which determine its intensity of drift respectively its 
volatility, and for the function 



7(f) 



[e tix -l-9x/(l + x 2 )]dv{x). 



The formula (|69]l implies that the family P t = {P t>e : 9 e 0} 
is exponential on (X t ,At) for which the "extremally reduced" 
observation T(x t ) = x t is statistically sufficient. Thus, by 
Theorem 1, 



B(P t , gi ,Pt,e 2 \Pt,o) - B(Q t , Bl ,Qt,e 2 \Qt,o) 



(71) 



where Q t .s is a probability distribution on the real line 
governing the marginal distribution of the last observed value 
X t of the process X t . 

Queueing processes and Brownian motions: For illustra- 
tion of the general result of the previous subsection we can 
take the family of Poisson processes with initial value Xq = 
and intensities 77 = e e , 9e9 = K for which S = a = and 
c{9) = e 9 - 1 so that b t {9) = t (e e - l) . Then Q tfi is the 
Poisson distribution Poi(r) with parameter r = tr\ = te e and 
probabilities 



Qt,t 



[{x}} = y- = \[{x}] ■ exp{xtf - e^} 



for t9 = In t 



lni, \[{x}} 



The exponential structure is similar as above, so that by 
applying d57b to the cumulant function 6(1?) = e® = te we 
get for the Poisson processes with parameters 9\ and 62 



p a (0i,e 2 ) = t 



,a6 1 + (l-a)6 2 



(1 - a)e 6 



Combining this with doTt and Theorem 5 we obtain an explicit 
formula for the scaled Bregman distance (T7TT > of these Poisson 
processes. 

To give another illustration of the result of the previous 
subsection, let us first introduce the standard Wiener process 
X t which is the Levy process with v = {), 6 = 0, a = 1 and 
9 = 1. It defines the family of Wiener processes 



X* 



)X S , < s < t, 9 G (0, 00), 



which are Levy processes with 6 = 0, a = 1 and c(9) = so 
that (TTOt implies bt{9) = 9 2 /2. They are well-known models 
of the random fluctuations called Brownian motions. If the 
initial value Xq is zero then Q t fi is the normal distribution 
with mean zero and variance v 2 = t9 2 . The corresponding 
Lebesgue densities 



1 



V2 



: exp 



for 7? = 



1 

2rf 



are transformed by the mapping x 1 — > —y/\x\ of R on 
the negative halftone (— 00, 0) into the natural exponential 
densities exp {fix — b($)} with respect to the dominating 
density l/y^rjxf where 6(7?) = -f lni? = - In | + \ In 2t. 
Thus by (TSTJi 



In- 



?? e\- a 



{cf. ©). 



a6i + (1 - a)9 2 

This together with (f6Tb and Theorem 5 leads to the explicit 
formula for the scaled Bregman distance (TTTb of the Wiener 
processes under consideration. 
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Geometric Brownian motions: From the abovementioned 
standard Wiener process one can also build up the family of 
geometric Brownian motions (geometric Wiener processes) 

Y s = cxp{aX s + 9s}, 0<s<t, 9 e R, 

where the family-generating 9 can be interpreted as drift 
parameters, and the volatility parameter a > is assumed 
to be constant all over the family. Then, aX t + 9t is normally 
distributed with mean m = 9t and variance v 2 = a 2 t, 
and Y t is lognormally distributed with the same parameters 
m and v 2 . By (17 It . the scaled Bregman distance of two 
geometric Brownian motions with parameters 9\, 9 2 reduces 
to the scaled Bregman distance of two lognormal distributions 
LN(#ii, cr 2 t), LN(9 2 t,a 2 t). As said above, it coincides with 
the scaled Bregman distance of two normal distributions 
N(6»it, a 2 t), N(9 2 t,a 2 t). This is seen also from the fact that 
the reparametrization 



*=4. 



1 

2rf 



and transformations R i — > M 2 similar to that from the previ- 
ous example lead in both distributions N(/i, v 2 ) and LN(/x, v 2 ) 
to the same natural exponential density 



with 



1 d 2 



These two distributions differ just in the dominating measures 
on the transformed observation space X = M 2 . For (pi,vf) = 



3\t,a 2 t) and (fi 2 , u|) = (#2*, o- 2 t) we get 



and thus 



1 ^ At* \ ( d2 

and (if 2 ,T 2 ) 



a 2 ' 2a 2 t 



a 2 ' 2a 2 t 



b{a^ llTl ) + (1 - a)(d 2 ,T 2 )) - ab{d U Ti) - (1 - a)b{d 2 ,T 2 ) 



(ogl + (1 - a)fl 2 ) 2 - agj + (1 - cQflj 
2ct 2 



t 



Hence, for distributions P t ,e 1 , Pt,e 2 of the geometric Brownian 
motions considered above we get from ( |57| | 



p a (o u e 2 ) 



(a9 1 + (1 - a)6» 2 ) - + (1 - a) 



2a 2 



t . 



The expression (|6TT > can be automatically evaluated using 
this. Applying both these results in Theorem 5 one obtains 
explicit formula for the scaled Bregman distance (T7TT > of these 
geometric Brownian motions. 
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